跳到主要内容

k8s 1.20 prometheus

下载 github地址

https://github.com/prometheus-operator/kube-prometheus

wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.8.0.tar.gz

tar -zxf kube-prometheus-0.8.0.tar.gz cd kube-prometheus-0.8.0/manifests

所有配置均在manifests下

配置持久化存储 创建ceph的secrt 在 monitoring 命名空间创建pvc用于访问ceph的 secret

kubectl create secret generic ceph-user-secret --type="kubernetes.io/rbd"
--from-literal=key=AQDlGKZgG2xRNxAA4DYniPBpaV5SAyU1/QH/5w==
--namespace=monitoring

grafana-deployment.yaml 修改存储类型为pvc

volumes: #- emptyDir:

  • name: grafana-storage persistentVolumeClaim: claimName: grafana-data

最下方加入PersistentVolumeClaim配置


apiVersion: v1 kind: PersistentVolumeClaim metadata: name: grafana-data namespace: monitoring spec: storageClassName: dynamic-ceph-rbd accessModes:

  • ReadWriteOnce resources: requests: storage: 5Gi

prometheus-prometheus.yaml 下方加入storageClass配置

serviceMonitorSelector: version: 2.26.0

下方加入

storage: volumeClaimTemplate: spec: storageClassName: dynamic-ceph-rbd resources: requests: storage: 50Gi

部署kube-prometheus kubectl apply -f manifests/setup/ kubectl apply -f manifests/

查看

kubectl get pods -n monitoring kubectl get svc -n monitoring kubectl get ep -n monitoring 1 2 3 使用ingress代理prometheus

cat > ingress-prometheus.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-prometheus
namespace: monitoring
annotations:
kubernetes.io/ingress.class: "nginx"
prometheus.io/http_probe: "true"
spec:
rules:
- host: alert.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: alertmanager-main
port:
number: 9093
- host: grafana.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000
- host: prom.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-k8s
port:
number: 9090
EOF

kubectl apply -f ingress-prometheus.yaml kubectl get ing -n monitoring

## 排错

kube-state-metrics 镜像下载失败 修改kube-state-metrics-deployment.yaml镜像地址

containers:

  • args:
    • --host=127.0.0.1
    • --port=8081
    • --telemetry-host=127.0.0.1
    • --telemetry-port=8082 image: bitnami/kube-state-metrics:2.0.0 name: kube-state-metrics

prometheus监控ControllerManager、Scheduler没有数据 修改k8s集群配置文件: kube-controller-manager.conf和kube-scheduler.conf,修改后重启服务 --bind-address=0.0.0.0 1 创建kube-controller-namager和kube-scheduler的svc

cat > kube-controller-namager-svc-ep.yaml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: kube-controller-manager
namespace: kube-system
labels:
app.kubernetes.io/name: kube-controller-manager
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
- name: https-metrics
port: 10257
targetPort: 10257
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: kube-controller-manager
namespace: kube-system
labels:
app.kubernetes.io/name: kube-controller-manager
subsets:
- addresses:
- ip: 192.168.2.101
ports:
- name: http-metrics
port: 10252
protocol: TCP
- name: https-metrics
port: 10257
protocol: TCP
EOF

kubectl apply -f kube-controller-namager-svc-ep.yaml kubectl get ep -n kube-system

kube-scheduler

cat > kube-scheduler-svc-ep.yaml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: kube-scheduler
namespace: kube-system
labels:
app.kubernetes.io/name: kube-scheduler
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10251
targetPort: 10251
protocol: TCP
- name: https-metrics
port: 10259
targetPort: 10259
protocol: TCP

---
apiVersion: v1
kind: Endpoints
metadata:
name: kube-scheduler
namespace: kube-system
labels:
app.kubernetes.io/name: kube-scheduler
subsets:
- addresses:
- ip: 192.168.2.101
ports:
- name: http-metrics
port: 10251
protocol: TCP
- name: https-metrics
port: 10259
protocol: TCP
EOF

kubectl apply -f kube-scheduler-svc-ep.yaml
kubectl get ep -n kube-system



注意labels,要与kube-prometheus中kubernetes-serviceMonitorKubeScheduler.yaml和kubernetes-serviceMonitorKubeControllerManager.yaml里面的标签对应

修改kubernetes-serviceMonitorKubeControllerManager.yaml和kubernetes-serviceMonitorKubeScheduler.yaml配置文件,修改采集方式为http方式

spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: http-metrics
scheme: http
tlsConfig:
insecureSkipVerify: true



生效配置

kubectl delete -f kubernetes-serviceMonitorKubeControllerManager.yaml
kubectl apply -f kubernetes-serviceMonitorKubeControllerManager.yaml

kubectl delete -f kubernetes-serviceMonitorKubeScheduler.yaml
kubectl apply -f kubernetes-serviceMonitorKubeScheduler.yaml


CoreDNS没有数据
查看coredns的现有标签

kubectl get ep kube-dns -n kube-system -o yaml|grep -A 5 'labels'

labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: CoreDNS


修改kubernetes-serviceMonitorCoreDNS.yaml配置, 修改标签为coredns现有标签

spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 15s
port: metrics
jobLabel: app.kubernetes.io/name
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
kubernetes.io/name: CoreDNS


生效配置

kubectl delete -f kubernetes-serviceMonitorCoreDNS.yaml
kubectl apply -f kubernetes-serviceMonitorCoreDNS.yaml